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2 INTRODUCTION 



1 Abstract 



The general linear model (GLM) is a well established tool for analyzing functional magnetic res- 
onance imaging (fMRI) data. Most fMRI analyses via GLM proceed in a massively univariate 
fashion where the same design matrix is used for analyzing data from each voxel. A major lim- 
itation of this approach is the locally varying nature of signals of interest as well as associated 
confounds. This local variability results in a potentially large bias and uncontrolled increase in 
variance for the contrast of interest. The main contributions of this paper are two fold (1) We 
develop a statistical framework called SMART that enables estimation of an optimal design matrix 
while explicitly controlling the bias variance decomposition over a set of potential design matrices 
and (2) We develop and validate a numerical algorithm for computing optimal design matrices for 
general fMRI data sets. The implications of this framework include the ability to match optimally 
the magnitude of underlying signals to their true magnitudes while also matching the "null" signals 
to zero size thereby optimizing both the sensitivity and specificity of signal detection. By enabling 
the capture of multiple profiles of interest using a single contrast (as opposed to an F-test) in a 
way that optimizes for both bias and variance enables the passing of first level parameter estimates 
and their variances to the higher level for group analysis which is not possible using F-tests. We 
demonstrate the application of this approach to in vivo pharmacological fMRI data capturing the 
acute response to a drug infusion, to task-evoked, block design fMRI and to the estimation of a 
haemodynamic response function (HRF) response in event-related fMRI. Although developed with 
motivation from fMRI, our framework is quite general and has potentially wide applicability to a 
variety of disciplines. 



2 Introduction 



General linear models (GLMs) with Gaussian noise are very popular tools for fMRI model-based 
analyses [7j. The design matrix (DM) for GLM analysis is usually based on the stimulus paradigm 
used during the experiment. With each column or explanatory variable (EV) of the DM is associated 
a parameter estimate (PE) measuring the strength of that EV in the overall model fit. The 
investigator defines linear contrasts of interest to extract meaningful values reflecting aspects of the 
brain's response to the applied paradigm. Since fMRI data is composed of thousands of measured 
timeseries across different points or voxels in the brain (~30000 voxels is typical), GLM based 
analysis for fMRI proceeds in a massively univariate way, meaning that the same DM is used to 
analyze all voxels. 

One very attractive property of the PE's estimated using GLM is that they are unbiased and 
of minimum variance if the DM is correctly specified (Gauss-Markov theorem) [9]. However, the 
exact mechanism underlying fMRI signal generation is extremely complex and in fMRI data the 
'true' signal of interest is often superimposed with various artifactual signals due to physiology, 
motion and possible scanner effects. Moreover, the true temporal profile of the signal of interest 
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may not be constant across brain regions or subjects; that is, there might be a range of temporal 
response profiles induced by the same paradigm. Thus, the assumption of a correctly specified 
model using a single DM for all voxels often does not hold in real fMRI data. In view of this fact, 
when using a GLM framework to analyze such data, one must have a good handle on the bias and 
variance of imperfect PE's calculated using the mis-specified DM. This is an extremely important 
point that cannot be ignored in fMRI analysis especially because the implications of Gauss-Markov 
theorem do not hold for mis-specified DMs. If the bias and variance introduced in the PE's at 
the first (individual subject) level are uncontrolled then misleading results can be obtained when 
generalizing to a group of subjects. 

Small modeling misspecifications can be corrected to a certain extent using simple approaches, for 
example, by adding the derivative of the main EV to the DM to capture small temporal shifts 
[5], [13]. However, these additional EV's are mostly based on heuristics and can still result in 
uncontrolled bias and variance of the resulting PE's. Basis function approaches result in more 
flexibility by allowing arbitrary response shapes to be matched via appropriately specified regressors 
|15] . It is possible to achieve a low variance fit to the data using these basis functions but it is 
difficult to define a meaningful contrast of interest that captures the underlying signal amplitude. 
In addition, the PE's estimated are again not controlled for bias or variance. Moreover, group 
analysis is non-trivial in this context as the first-level F-test results cannot simply be propagated 
to the group level. 

In this article, we wish to derive a general theoretical basis that enables computation of optimal 
DM's for GLM analyses as well as provide an algorithm that is practical to implement for practi- 
tioners. First, we develop an algorithmic framework called SMART (SiMultaneous biAs vaRiance 
and Residual opTimization) to derive automatically both a meaningful contrast and a DM simul- 
taneously or, given a specified contrast derive a DM suitable for use at all voxels that models 
the set of all potential DM's in an optimal way, capturing a wide range of potential signals of 
interest while controlling for both the bias and variance of the signal amplitude measure. This 
explicit optimization will automatically optimize both the sensitivity and specificity of detecting 
signals of interest in the data. Second, we apply the framework to specific case studies arising from 
both pharmacological challenge and task-evoked fMRI experiments. We apply the optimal design 
matrices to the real fMRI data and demonstrate the more robust detection of fMRI responses in 
vivo. 

3 Development of SMART 

3.1 Performance measure for a single design matrix 



To start let us assume that the true model generating the data is 

y = Xp + e 



(1) 



3.1 Performance measure 
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where X G R"^'^, /3 G R'^, y G R" and e ~ A^(0,cj^/„). [If the noise is non-white then the same 
discussion in this section apphes after an initial pre- whitening step. A complication is that there 
might be interaction between pre-whitening and model misspecification.] 

Unfortunately we do not know what X is and so we use a design matrix Z G R"^p for analyzing 
the data generated by the above model. 

y = Zj + ei (2) 
where ei ~ N{0,afln)- The usual GLM estimates are: 

7=(Z^Z)-iz'^y (3) 

and 

Coi{j) = al{Z^Z)-' (4) 

where 

al = - - Zj) 

n — p 

For a contrast of interest cx G R*^ for the true model [T] let cz G R^ be the corresponding contrast 
of interest in the proposed model [2} 

It can be shown that the following holds under the true model: 



where 



and 



E{^) = {Z^Z)-^Z^X(3 

0"^ n — p 

^ p^X^PzXp 

Pz = In- Z{Z^Z)'^Z^ 



(6) 
(7) 

(8) 
(9) 



When Z = X in the above then we recover the usual GLM quantities. 

To answer the question, how does c^j compare with we define the following 

Define the normalized contrast bias Ch (assuming ^ 7^ 0) as follows: 

_ clE{^) - cl(3 _ cl{Z^Z)-'Z^X{fila) 
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This measures bias as a fractional change in the PE of interest from the true value. When § = 0, 
Cb becomes undefined. In this case we define it as the numerator in the above equation, i.e., 
Cb = cl{Z'^Z)-^Z'^X{P/a) 

Define the normalized model variance bias Vb as follows: 



Define the normalized contrast variance change with respect to the Gauss-Markov estimate as 
follows: 

_ E{af/a')cl{Z^Zr^cz ^ 



(12) 



The test statistic of interest is: 



r(7,c7i;Z;cz) = , " ^ ^ (13) 



Since 7 and di are independent: 

Ideally we would like the misspecified model to perform well, i.e, 0^7 be as close to c^(3 as possible 
and at the same time have as small estimated variance as possible. This bias-variance tradeoff is 
captured in the function 

F = ^lcl{Z^Zr^cz + l,{E{cl^ - clP)f (15) 

Note that this function captures simultaneously not only the bias and variance of the contrast of 
interest but also the full model residual (via ai). This fact will be exploited in the next section. 
The expected value of the above under the true model can be written as: 

E{F) = {Vb + l)cl{Z'^Z)-\z + {clPlafCl (16) 



All the above definitions are functions of the signal to noise ratio (/3/cr) in the data. 



3.2 Calculating optimal design matrices 
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3.2 Calculating optimal design matrices 

In this section we will set up the optimization problem that will enable us to compute optimal design 
matrices for arbitrary data sets. Ideally we would like the PE's from our optimal design matrix 
to have nice properties such as a low bias, low variance as well as a "low residual" overall model 



fit. It can be seen that 15 will be small when a candidate DM satisfies these ideals as compared 



to another that does not. Hence 15 is a joint performance measure that captures all attributes 
of interest in one function for a given design matrix X. How do we generalize this concept to 
enable good performance of the optimal DM over a range of candidate DMs? Suppose our data 
is expected to contain the m design matrices Xi,X2, ■ ■ ■ ,Xm- Matrix Xi is of size n x pi, where 
Pi is the number of regressors in Xi. Suppose noisy data is generated from Xi at SNR /Jj/di and 



suppose that the contrast of interest for Xi is cx^. Expanding 16 we get: 



/(Z, cz; Xf, ^; cxj = c'^ziZ^ zy^cz 

(Ti 



l + tr[Pz 



{n- Pi) erf 



[17) 



+ \cl{Z''Z)-^Z^Xil3,l3jxfZ{Z''Z)-^cz - ^cl{Z''Z)-^Z^Xi(3^/3fcx, + (c^^Pi/ai)^ 



<77 



<77 



Suppose weights 11)1,11)2, ... , Wm measure the frequency of occurance of each DM Xi in the data 
such that higher values of Wi indicate a higher frequency and Y^l^i Wi = 1. The objective function 
of interest is the mean performance measure over all design matrices. Hence, we define the following 
composite objective function (leaving off the multiplier ^,,1"'' ^ ): 



G{Z,cz) = ^Wif{Z,cz;Xi; — ;cxj 

^ <7i 



Define the quantities: 



H 



and 



i=l 



n-pi 







V ... ^ 

fw{Xif5i y/WiXi(3i Jw^XmPr^ 



0-1 



Cfi 



0-1 



fwic^,f3i 



(Ti 



(18) 



(19) 

(20) 
(21) 



With these definitions the composite objective 18 can be written as: 



3.3 Local control of bias and variance 
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G{Z,cz) = cl{Z^Z)-^cz 



c%{Z^Zy^Z^HH'^Z{Z'^Z)~^cz - 2c^{Z'^Z)-^Z^Hi + Wi{cl^pi/a^ 



(22) 



(23) 



In general, one can put constraints on the columns of Z (e.g., fixing certain columns) such as: 

ZA = B, (24) 

where A £ EP^'^ and B G i?"^'^ are fixed matrices. Similar constraints can be imposed on the 
contrast vector: 



Ccz = d 



(25) 



where C G i?'"^^ is a fixed matrix and d G R' is a fixed vector. 

Our goal is to minimize the composite objective function G{Z, cz) that measure the weighted 
bias/variance decomposition over all potential DMs in the data. Hence, the complete optimization 
problem is written out as: 



Z,cz = arg min z,czG{Z,cz) (26) 
s.t. ZA = B (27) 
s.t. Ccz = d s.t. rank(Z) = p (28) 

The last constraint above simply fixes the rank of Z or the number of independent columns in 
Z. 



3.3 Local control of bias and variance 

It is straightforward to extend the concepts developed above to attain a local control of bias- variance 
decomposition i.e., to weigh the contribution of bias and variance terms to the overall performance 



measure for each DM Xj. The first step is modifying 17 to accomodate user defined bias/variance 



weighting by introducing a parameter for each Xi and rewriting the performance measure for 



3.3 Local control of bias and variance 
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Xj as follows: 



Pi 



f{Z, cz; Xf, ^; cx,;(t>i) = 2<Pi ci,{Z' Zy'cz 



l + triPz 



XAPfxl 



{n-pi)al 



+(2 - 2^i) ( \cl{Z^Z)-^Z^XA0fxfZ{Z^Z)-^cz - \cl{Z^ Z)-^ Z^ XA0f cx, + (c^A/a, 

(Tf (Jf 



The parameter (pi G (0, 1) controls the relative importance of the bias and variance terms. When 



= 0.5, both terms are equally weighted as in 17, Higher values of give higher weight to the 
variance term and lower values of (pi give higher weight to the bias term. The composite objective 
function for local bias-variance weighting is defined as before: 



G^{Z,cz) = ^Wif{Z,cz;Xi; —]cx,](pi) 



i=l 



Define diagonal matrices $y and as follows: 



(30) 



$1 



and 



/ 201 



V ... 

(2-2(1)1 



V 

With these definitions Ga, can be written as: 



2(/>ri 



2 - 20^ 



(31) 



(32) 



G^{Z,cz) = c^z{Z^Z)-^cz 



2PiWi + tr {PzH^v^H 



i=l 



(33) 



cI{Z^Z)-^Z'^H^bH''Z(Z^Z)-^cz - 2cl{Z'^Zy^Z^H^B^ + ^^(^ - 20,)(c£ AM)' (34) 



This approach can be easily extended to the simultaneous optimization of multiple contrasts using 
the function: 



q 

G^{Z,cz,,...,czJ = ^G0(Z,czJ 



(35) 
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4 Algorithm 



4.1 Implementation 

In this section we describe simplified optimization strategy that seems to work for the nature of 
the problem under consideration. Basically it involves simple gradient descent steps with adaptive 



step sizes. This practical algorithm is summarized in Algorithm 1 



Algorithm 1 Algorithm for optimizing DM 



Require: Problem variables H, S, £, A, C 

Require: Algorithmic variables G (0, 10~^), 6 G (1,5] and ?7i,ry2 G (0, 10~®) 
Require: The size of optimal DM, p and Initial point Zq, czq satisfying ZqA = B and Ccz^ = d 
Ensure: Outputs are the optimal DM, Z, the optimal contrast, cz and the optimal objective 
function, F 

Compute orthogonal projectors Pa = Ip - A{A'^ A)''^ A'^ and P^t = Ip - C'^{CC'^)~^C 
found = 0, j = 
while found = do 

Let S, = lg{Z„cz^) and Tj = §g{Zj,cz^) 
success = 



while success = do 

8: cz,^i = cz, - ajPcrTj 

9: Fj = G{Zj,cZj) and F^+i = G{Zj+i, cz^^^) 

10: if Fj+i < Fj then 

11: CKj+i = 9aj 

12: success = 1 

13: else 

14: aj = aj/9 

15: end if 

16: end while 

17: if — FjW < 7]i or aj+i < r]2 then 

18: found = 1 

19: else 

20: j =j + l 

21: end if 

22: end while 

23: return Z = -Zj+i, c'z = czj^-^ and F = i^j+i 
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Xi, — ,cx, 

(T, 



IT 



31 



IE 



0n 



f{Z,cz-Xi\—;cx,\4>i) 



f{Z, cz;X,:^:Cx.;4) 



f{Z, cz ; X,„ ; — ; cx,„ ; <?)„i) 

O'm 



Wi 



rz) = ^ -^z: -; cv, ; 0.) 



I]7 




Constraints 

= B 
Ccz = d 



I'gminz.cz [Z, cz) \^ — V 



Performance 
Evaluation 



Optimal Z , cz 



«.', = User defined design matrix weiglits 

4>i = User defined bias-variance scalings 

Xi = Potential design matrices in data, i = 1, . . . , m 

& = SNR for Xi 

cx, = Contrast of interest for Xi 



Figure 1: Flowchart for computing optimal design matrices. Design matrices Xj, contrasts of 
interest cx^ and signal to noise ratios ^ are locally weighted using bias-variance weighting to 

compute performance measures f{Z,cz',Xi;^; cx^, 4>i)- These performance measures are combined 
using weights Wi to form the objective function G<^(Z, c^). Function c^) is then optimized 

with respect to Z and cz subject to user defined constraints to yield the optimal design matrices 
Z and Cz- A post-processing step then generates performance curves for user inspection. 
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The gradients of are given by: (see the appendix 1 1 1 • 1 1 for detailed derivation) 
-Z{Z^Z)-\2czcl){Z^Z)-^ 



dG 
dZ 



2(l),Wi + tr {PzH^v^H^) 



.i=l 



-2{cl{Z^ Z)-^cz)PzH<^v^H^ Z{Z'^ Z)-^ 
-2Z{Z'^ Z)~^czcl{Z^ Z)-^ Z^ H<^>bH^ Z{Z^ Z)~^ 
-2Z{Z^Z)~^{Z^H^bH'^Z){Z^Z)-'^czcI{Z'^Z)-^ 
+2H^bH^ Z{Z^ Z)-^czcl{Z'^ Z)-^ 

+2Z{z'^ z)-'^ z'^ H^B^cliz^ zy^ 
+2Z{z'^ zy^'czf'^BH'^ z{z'^ zy^ 

-2H<^B^cl{Z^Zy^ 



(36) 



dG 
dcz 



2{Z^Zy^cz 



'^(kiWi + tr [PzH^v^H^) 

.i=l 

T r7\ — l ryT TT^ _ TjT ry I ryT ryX—X^ 



(37) 



+2{Z^Zy^Z^H^Bii^Z{Z,^Zy^cz 
-2{Z'^Zy^Z^H<^i 



When optimizing over multiple contrasts as per 35 the gradients are given by: 



dG^ 
dZ 



^ dG^{Z,czJ dG^ _ dG^{Z,cz^ 

s=l 



dZ 



dcz. 



dcz. 



(38) 



4.2 Validation 



We validate the practical approach by comparing optimal solutions from the more sophisticated 
solver with the ones produced using the practical solver. Our state-of-the-art optimization solver 
(see appendix) was used to solve the validation test problems. The optimization core uses an aug- 
mented lagrangian algorithm (inspired by the implementation in LANCELOT package [3], [5]) to 
solve equality constrained problems. Inequality constraints are handled by first transforming them 
to equality constraints via slack variables and solving the resulting bound constrained optimization 
problem. Some features of interest are as follows: 

1. A Trust region based approach [lOJ is used to generate search directions at each step (for 
both equality constrained and inequality constrained problems) . 



4.2 Validation 
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2. For equality constraints only, the subproblems above are solved using a conjugate gradient 
approach (Newton-CG -Steihaug) [H] that is fast and accurate even for large problems and 
can handle both positive definite and indefinite Hessian approximations. If both equality 
and inequality constraints are present then we solve the trust region problem with a non- 
linear gradient projection technique |2j followed by subspace optimization using Newton-CG- 
Steihaug. 

3. A symmetric rank 1 (SRI) quasi-Newton approximation to the Hessian [4j is used which is 
known to generate good Hessian approximations for both convex and non-convex problems. 
As suggested in [12^ we do the update also on the rejected steps to gather curvature informa- 
tion about the function. We provide options for BFGS [L especially for convex problems and 
an option for preconditioning the CG iterations. We also implement limited memory variants 
of SRI and BFGS for large problems. 

4. Our algorithm accepts vectorized constraints so that multiple constraints can be programmed 
simultaneously. Only gradient information is required. Hessian information is optional but 
not required. 

We tested the performance of our algorithm using standard optimization benchmarks from the 
GAMS performance benchmark problems (http://www.gamsworld.org/performance, ^). The 



appendix 11.8| provides more technical details of the algorithm. 



4.2.1 Validation Test A 

Motivated by a practical data-set that we later describe we consider for illustration purposes the 
case study when the profiles of interest are shifted relative to a base profile by variable units and 
our goal is to simultaneously capture all responses with a single design matrix. To test and validate 



the optimization framework, we used the basic design matrix Xq from figure 27, m = 50 expected 
design matrices were proposed with 



Xi(:,l) = Xo{:, 1) shifted right by i timepoints (39) 

X,(:,2)=Xo(:,2) (40) 

We chose ^ = [1,0.5]'^ and = [1,0]-^, Vi. The weights were chosen as Wi = to reflect the 
equal likelihood of observing any Xi. We chose (pi = 0.5, Vi in this validation test. 

The rank of Z was chosen to be 4 and the matrix A was chosen as ^4 = [ei, 62] where ei G -R^ is a 
unit vector with 1 at position 1 and zeros elsewhere. Similarly for 62. The matrix B was chosen as 
Xq to fix the first two columns of Z to those of Xq. 

C was chosen as the identity matrix I\ and d was set to [1, 0, 0, 0]"^ to fix the contrast cz. 



4.2 Validation 
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Figure 2: Validation Test A: Convergence diagnostics for the advanced solver. Figure shows the 
evolution of objective function, the Lagrangian, norm of the gradient of the Lagrangian, norm 
of the constraint satisfaction error, norm of the Lagrange multipliers and progress monitoring 
parameter over algorithm iterations. The last row shows the optimal solution (i.e., the DM and 
contrast) displayed as a vector and the optimal Lagrange multipliers for the chosen constraints on 
the columns of Z and the contrast cz- The optimal objective of G{Z,cz) = 2.693 was attained in 
74 iterations. 



4.2 Validation 
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2DD 300 



500 EOO 













SC?' 



(a) Initial design matrix Zo 



(b) Optimal design matrix Z 



Figure 3: Validation Test A: (a) Initial design matrix Zq and (b) Optimal design matrix Z (m = 50). 
The first two EVs were constrained to their shift values. The contrast cz was constrained to be 
[1,0,0,0]^. 



4.2.2 Validation Test B 

In this case, the contrast vector cz was left unconstrained. Everything else is the same as in Valida- 
tion Test A. Convergence diagnostics and optimal Z for this case are shown in Figure [4] and Figure [H] 
respectively. The optimal contrast was determined to be c'z = [0.73519; 0.47890; 0.76016; 0.75789]. 





Exact 


Algorithm 1 


Case a 


2.693467 


2.693490 


Case b 


0.265484 


0.265502 



Table 1: Optimal objective values for the Exact algorithm and Algorithm 1 for Case a and Case b. 
The optimal contrast using Algorithm 1 was found to be cz = [0.73523; 0.47890; 0.75704; 0.75844] . 



4.2 Validation 
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Figure 4: Validation Test B: Convergence diagnostics for the advanced solver. Figure shows the 
evolution of objective function, the Lagrangian, norm of the gradient of the Lagrangian, norm 
of the constraint satisfaction error, norm of the Lagrange multipliers and progress monitoring 
parameter over algorithm iterations. The last row shows the optimal solution (i.e., the DM and 
contrast) displayed as a vector and the optimal Lagrange multipliers for the chosen constraints on 
the columns of Z and the contrast cz- The optimal objective of cz) = 0.2655 was attained in 
8 iterations. 



4.2 Validation 
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m ZW 300 400 500 600 1DD 2DD 300 400 500 600 



(a) Initial design matrix Zq (b) Optimal design matrix Z 

Figure 5: Validation Test B: (a) Initial design matrix Zq and (b) Optimal design matrix Z (m = 50). 
The first two EVs were constrained to their shift values. The contrast cz was left unconstrained. 




(a) Initial design matrix Zq (b) Optimal design matrix Z 

Figure 6: Validation Test A: (a) Initial design matrix Zq and (b) Optimal design matrix Z 
(m = 50). The first two EVs were constrained to their shift values. The contrast cz was 
constrained to be [1; 0; 0; 0]. The problem was solved using Algorithm 1. The optimal objective 
was G{Z,cz) = 2.693490. 
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(a) Initial design matrix Zo (b) Optimal design matrix Z 



Figure 7: Validation Test B: (a) Initial design matrix Zq and (b) Optimal design matrix Z {m = 50). 
The first two EVs were constrained to their shift values. The contrast cz was left unconstrained. 
The problem was solved using Algorithm 1. The optimal objective was G{Z,cz) = 0.265502 



5 Algorithmic issues 



5.1 Note on initialization 



It is well known that when finding a local solution to an optimization problem as we do here, the 
choice of an initial point could have an impact on the estimated local solution. We acknowledge 
that there could be many interesting initialization strategies. Here we propose one such strategy 
for initialization of Z and cz- We recommend a heuristic strategy for initialization of the primary 
column in Z. First we try to find a vector v that is closest to primary columns in Xi in the following 
sense: 



argmin „ X] II ( c^, — ) {XiCx,) 



T Pi \ M|2 
CX,- ) ^^)ll2 



(41) 



The solution to 41 is given by: 



(42) 



5.2 Esthimiing the size of Z 
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Next we form the nx m matrix M = [Mi, . . . , Mj, . . . M^] of residuals with ith. column: 



Mi 



cx,— ) [Xicxj - ( Cx, 



(43) 



Next we take the singular value decomposition of M to get: 



(44) 



where Um £ R^^™ is a matrix of left singular vectors, Vm £ R_™x™- ig a matrix of right singular 
vectors and Em holds the singular values of M. 

For optimizing a j3 column matrix Z wc choose (in matlab notation) the following initializa- 
tion: 



Zo{:,l) = v 
Zo{:,2:p) = Um{:,1: {p - I)) 
cz„ = [1;0; . . . ;0] {p rows) 



(45) 
(46) 
(47) 



Thus the vector v is used to initialize the primary column of Zq and the columns of Um are used 
to initialize the non-primary columns of Zq. The next step is to modify Zq and czq so that they 
satisfy the constraints Cczg = d and ZqA = B. 



Zo = Zo + {B-ZoA){A^A)-'A^ (48) 
czo = czo + C^iCC'^rHd - Ccz,) (49) 

This is not the only way to initialize Z, there can be many other strategies. In fact we have not 
used this initialization strategy in many of the examples presented here precisely to illustrate this 
point. 



5.2 Estimating the size of Z 

By accounting for expected variations in the shape and size of the response and then optimizing for 
a design matrix Z via the solution of an inverse problem that explicitly controls for bias and variance 
we automatically avoid overfitting in this framework. The framework also allows for inclusion of 
"null" data that is not simply Gaussian noise but is some structured signal such as a drift (see 
Example 3) to explicitly instruct the optimization process to "equate" it to "no signal" during 
optimization. Why then is it important to choose the size of Z? One reason is to maximize the 
degrees of freedom available for subsequent first level or higher level statistical tests. We propose 
the following strategy for choosing the "optimal" number of columns in Z. 



5.2 Estimating the size of Z 
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Algorithm 2 Choosing optimal number of columns in Z 



Run 



Algorithm 1 to estimate Z, cz and F for the current value of p 



Require: Problem variables H, S, £, A, C 

Require: Algorithmic variables oq G (0, 10~^), 6 E (1,5] and r]i,ri2 £ (0, 10^^) 
Require: Initial choice p = po and Pmax the maximum value of p 

Require: User chosen accuracy cutoff Rc € (0.5, 1) with a default value of Rc = 0.95 (95% cutoff) 
Ensure: Outputs are the optimal number of columns in the DM popt 
1: Set j = 1 

2: for p = Po to Pmax do 

Require: Zq, czn sat isfying ZqA = B and Cczq = d such that Zq and cz have p columns 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 



Set PestU) = P 

Set F{j) = F, Z{j) = Z and cz{j) = cz 

J = J + 1 
end for 

Compute Fynax = maxjF{j) and Fmin = minjF{j) 
Set j = 1 

for p = pQto Pmax do 

-R(j ) — i,Fmax F iyjy) I iyFmax Fmin) 
j=j + l 

end for 

Calculate the minimum j meeting the cutoff, j = min{j : R{j) > -Rc} 
return popt = PestU) 
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5 ALGORITHMIC ISSUES 



The basic idea in Algorithm 2 is to run Algorithm 1 for a range of values of p and choose a value 



of p that achieves a user chosen reduction in the objective function value relative to the maximum 
possible reduction over all values of p. Please note that the strategy for choosing the number of 
columns proposed here is by no means the only one. For example, it could also involve reduction 
in the model variance Vf, below a specified user value. If the R vs p curve has a local maximum 
as opposed to a monotonic increase then the location of this maximum also is a reasonable choice 
for the optimal size of Z. In principle one can correct for variability due to initialization using 



Algorithm 3 that essentially runs Algorithm 2 for a number of initializations 



Algorithm 3 Choosing optimal number of columns in Z - Sensitivity to Initialization 



Require: Problem variables H, S, £, A, C 

Require: Algorithmic variables oq G (0, 10~^), 6 € (1,5] and ?7i,?y2 £ (0, 10" 
Require: Initial choice p = po and Pmax the maximum value of p 

Require: User chosen accuracy cutoff Rc G (0.5, 1) with a default value of Rc = 0.95 (95% cutoff) 
Require: User chosen number of trials nuer 

Ensure: Outputs are the optimal number of columns in the DM Popti'n'iter) over nuer runs 
1: for j = 1 to niter do 



Run Algorithm 2 to get popt for this run 



Set PestU) = Popt 

end for 

Poptiriiter) = MedianjPestU) 
return Poptinner) 



We ran Algorithm 2 for Validation Test A and the results are shown in Figures 8][9 It was found 
that Popt = 3 using a cutoff of Rc = 0.95 for this case study. 



5.3 Intelligent choice of 



So far we have not discussed sensible choices of (pi for design matrix Xi. Is a constant (pi for all i 
the best choice? In this section we wish to motivate some heuristic rules for selecting (pi based on 
user defined objectives. Equation 29 can be re-written using definitions of Cb and CVa for Xi from 
[inland [12] as follows: 

2" 



f{Z,cz;Xi; ^■,cx■A^) = {{2(Pi)cl^{XlXi)-^cx,]{CVM + 1) + < 




Cl (50) 



The contribution of fractional contrast bias to the objective function is controlled by its multi- 



pher <^ (2 - 20i) 



> . Similarly, the contribution of fractional variance change w.r.t Gauss- 



Markov estimate term (CVaj + 1) is controlled by its multiplier {^{2(l)i)c^^{Xj Xi) ^cx^^- 
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5 ALGORITHMIC ISSUES 




(a) p versus F (b) p versus F zoomed in 

Figure 8: Illustration of the procedure to estimate the size of Z using data from Validation Test 
A (a) Number of columns p versus the optimal objective F (b) Same as (a) but showing the slow 
decline of the tail in (a) 




3 4 5 6 7 

p = Numbef of columns in Z -> 



Figure 9: Data from Validation Test A showing R versus p curve and the cutoff at 0.95 indicating 
that popt = 3 
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5 ALGORITHMIC ISSUES 



5.3.1 Choice A 



Suppose the user wants to put k times more emphasis on the term compared to {CVai + 1) 
term. Then a sensible choice would be: 



(2 - 2(Pi 



/c{(20,)c£(XfX,)-^cxJ 



(51) 



Solving for 4>i and denoting the calculated value by <pi{k) to indicate A; times more emphasis on C^- 
term compared to {CVai + 1) term, we get: 



4. ft 



c^.ft 



kcuxrx,)-^cx. 



It is clear that as k ^ oo, 4>i{k) — > and as A; — > 0, (/>i(fe) — 1. This behavior is as expected 
but interestingly it is non-linear. In particular, choosing k = 1 i.e., equal emphasis on C^- and 
(CVa* + 1) terms does not necessarily imply (f>i{l) = 0.5. Choosing sensible values of k boils down 
to sifting through the optimal solutions and picking ones that make practical sense. In general one 
wants C^- ~ 10~^ and CVa ~ 0. Thus it makes sense to choose k = 10^ to make both terms in the 
optimization of a similar magnitude while satisfying reasonable practical objectives. 



5.3.2 Choice B 



Suppose the user wants to put k times more emphasis on the \Cbi\ term compared to {CVai + 1) 
term. In this case a sensible choice would be: 



V(2 - 2</.i 



k{{2ct>i)c^^{Xj'Xir'cx,} 



(53) 



Squaring both sides and solving the resulting quadratic equation and discarding the negative root 
results in: 

-1 + ^l + 4a(A;) 



where 



a{k) = 



2a{k) 

2{cl^{XfX,)~^cx^fk^ 



4.ft^ ^ 



(54) 
(55) 
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6 CASE STUDIES 



It can be shown that as A; — > oo,a oo, (j)i{k) and as ^ 0,a — > 0,(j)i{k) 1. Again, this 
behavior is to be expected. As before, choosing k = 1 i.e., equal emphasis on \Cbi\ and {CVai + 1) 
terms does not imply (pi{l) = 0.5. Practically speaking one wants \Cbi\ ~ 10~^ and CVa ~ and 
hence a reasonable value of k for choice B would be A; = 10^ to make the two terms comparable in 
magnitude during the optimization process. 



An example of the non-linear relationship between log k and is shown in Figure 10 




Figure 10: Plot of log A; vs (j) for design matrices Xi, X25 and X50 from Example 1. This figure 
shows that as A; ^ 00, (/> — > and as A; ^ 0, i?i> ^ 1 but the relationship is non-linear, i.e., choosing 
i;^ = 0.5 does not give equal emphasis to bias and variance terms {k 7^ 1). 



6 Case Studies 

In this section we present performance curves for the estimated optimal DM's for six example cases. 
In particular, we look at the contrast bias Cb, the model variance bias and contrast variance 
change w.r.t the Gauss-Markov estimate CVa as key performance measures. 



In the first 3 examples, Xj, were chosen as in 39 In the Example 4 additional Xi were added 

to those in Examples 1-3 for illustration purposes. Elxample 5 deals with a new set of Xi derived 
from the standard block design used in fMRI experiments. Example 6 illustrates the application 
of proposed technique to the problem of capturing variable shapes of the Haemodynamic response 
function (HRF) in fMRI. 



6.1 Example 1 



24 



6 CASE STUDIES 



6.1 Example 1 

In this example, we used the default weighting of bias and variance by choosing (j) = 0.5. The 
first two columns of Z were fixed as before and the contrast cz was fixed at [1;0;0]. We also chose 
Wi = 1 to give equal weights to all design matrices. The unconstrained column in Z was initialized 
randomly with elements drawn from a uniform distribution C/(0,1). The results are shown in 



Figures and 12 



6.2 Example 2 

This example is the same as Example 1 except for two differences: 



1. First we choose (pi automatically using the strategy proposed in Section 5.3 We used Option 
A to initialize (p using k = 10^. 

2. Second, we initialize the columns of Z automatically using the strategy proposed in Section 

EH 



The results are shown in Figures 13 and 14 



6.3 Example 3 

In this example, we attempt to investigate the effect of bias- variance weightings on the performance 
curves. We put a higher emphasis on reducing bias by choosing (j) = 0.1. Please note that the 
relationship between the value of (pi and the importance of bias or variance terms is non-linear (see 
section 5.3). The first two columns of Z were fixed as before and the contrast cz was fixed at [1;0;0]. 
We also chose Wi = 1 to give equal weights to all design matrices. The unconstrained column in Z 
was initialized randomly with elements drawn from a uniform distribution f7(0, 1). The results are 



shown in figure 15 and 16 



6.4 Example 4 

In this example, we attempt to investigate the effect of bias- variance weightings on the performance 
curves as well as the effect of optimizing the entire design matrix Z. We put a much higher emphasis 
on reducing bias by choosing (p = 0.01. As before, note that the relationship between the value of 



(pi and the importance of bias or variance terms is non-linear (see section 5.3). The contrast cz 
was fixed at [1;0;0] but the design matrix Z was left unconstrained. We also chose Wi = 1 to give 
equal weights to all design matrices. The potential DM set from before was augmented with the 
following additional DM's 

1. Xi,cXi same as before but ^ = [— 1;0.5] 



6.4 Example 4 
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6 CASE STUDIES 




Figure 11: Example 1: (a) Initial design matrix (DM) along with random initialization of the 
3rd column. The first two columns were fixed at their initial values and the contrast was fixed at 
[1;0;0] (b) Estimated optimal DM. Notice how the 3rd column converges to a non-random profile (c) 
Performance curves showing the fractional contrast bias Cb, contrast variance change w.r.t Gauss- 
Markov estimate CVa and model variance bias Vf, (d) For all i, the DM weights Wi = 1 implying 
equal likelihood of observing any of the specified DM's and bias-variance scalings 0j = 0.5 for all i 
implying the default weighting of bias and variance. 
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(b) 



Design matrix X{25}, 1000 simulations at SNR = 1, S = 1 



1000 simulations 




Figure 12: Example 1: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
6 in Algorithm 1 was set to 6 = 2. For each design matrix (DM) Xi entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM X25 at 
SNR and the GLM fit using the optimal DM. It also shows the distribution of c^7 over 1000 
simulations. Figure (d) is a summary errorbar plot showing E{c^^) over 1000 simulations for data 
generated from each DM. The error bars represent unit standard deviation of c^T (not standard 
deviation of E{c^^) ) to quantify the variance in estimation via simulation. 
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0.02 - 
0.04- 



frac contrast bias 




5 10 15 20 25 30 35 40 45 

DM no -> 



frac contrast variance ctiange w.r.t Gauss-Markov estimate 




Figure 13: Example 2: (a) Initial design matrix (DM) initialized using the automatic strategy 
proposed in section |5.1[ The first two columns were fixed at their initial values and the contrast 
was fixed at [1;0;0] (b) Estimated optimal DM. (c) Performance curves showing the fractional 
contrast bias Cb, contrast variance change w.r.t Gauss-Markov estimate CVa and model variance 
bias Vb (d) For all i, the DM weights Wi = 1 implying equal likelihood of observing any of the 
specified DM's. The bias-variance scalings (pi were chosen using the automatic strategy proposed 
in section using k = 10^. 
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600 
iter number - 




(a) 



600 800 
iter number -> 



(b) 



Design matrix X{50}, 1000 simulations at SNR = 1, P. =1 




Design matrix X{50}, 1000 simulations at SNR = 1, P - 1 





30 
DM no - 



(d) 



Figure 14: Example 2: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
6 in Algorithm 1 was set to 6 = 2. For each design matrix (DM) Xi entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM X50 at 
SNR ^ and the GLM fit using the optimal DM. It also shows the distribution of c^7 over 1000 
simulations. Figure (d) is a summary errorbar plot showing E{c^^) over 1000 simulations for data 
generated from each DM. The error bars represent unit standard deviation of c^T (not standard 
deviation of E{c^^) ) to quantify the variance in estimation via simulation. 
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frac contrast bias 




frac contrast variance change w.r.t Gauss-Marl<ov estimate 




5 ID 15 20 25 30 35 40 45 50 



10 15 20 25 30 35 40 45 51 ' o 5 10 15 20 25 30 35 40 45 50 

DM no -> DM no -> 



(c) 



(d) 



Figure 15: Example 3: (a) Initial design matrix (DM) along with random initialization of the 
3rd column. The first two columns were fixed at their initial values and the contrast was fixed at 
[1;0;0] (b) Estimated optimal DM. Notice how the 3rd column converges to a non-random profile 
(c) Performance curves showing the fractional contrast bias Cb, contrast variance change w.r.t 
Gauss-Markov estimate CVa and model variance bias Vb (d) In this example Wi = 1 and (pi = 0.1 
indicating a higher weighting to bias term during optimization. 
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(a) 



(b) 



Design matrix X{25}, 1000 simulations at SNR - 1, P, - 1 




1000 simuiations 



Design matrix X{25}, 1 000 simulations at SNR - 1 , P, - 1 




.Example 2: SNR - 1,3 - 1 



(d) 



Figure 16: Example 3: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
6 in Algorithm 1 was set to 6 = 2. For each design matrix (DM) Xi entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM X25 at 
SNR and the GLM fit using the optimal DM. It also shows the distribution of c^7 over 1000 
simulations. Figure (d) is a summary errorbar plot showing E{c^^) over 1000 simulations for data 
generated from each DM. The error bars represent unit standard deviation of c^T (not standard 
deviation of E{c^^) ) to quantify the variance in estimation via simulation. 
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6 CASE STUDIES 



2. Xi,cXi same as before but ^ = [1; —0.5] 

3. Xi,cXi same as before but ^ = [—1; —0.5] 

4. Xo,cxo same as before but ^ = [0; 1] 

This was done to constrain the sample space to make Z unbiased for sign changes relative to drift 
(1,2, and 3) as well as unbiased for pure drift (4). The 3rd column of Z was initialized to the 



optimal solution found in Example 1. The results are shown in figure 17 and 18 



6.5 Example 5 

We use a standard block design EV to illustrate application of the proposed technique. The EV 
consists of blocks of O's and I's (see |19(a) )The data might contain shifted or unshifted responses. 



The user estimates a maximum shift of 6 timepoints and would like to capture the variable data 
using an optimized design matrix. The user predominantly wants to control bias when the entire 
matrix Z is optimized for a fixed contrast cz = [1;0;0;0]. It is also desired to keep the main EV 
fixed as per the experimental paradigm and that the optimized contrast yield an unbiased estimate 
both for "positive" and "negative" activation. For this example, we chose p = 4 for the size of the 
optimal DM, the weights Wi = 1 and bias variance weights 4>i = 0.01. The set of potential DM's in 
this case capturing both "positive" and "negative" responses is thus: 

1. For i = 1, . . . ,6: Xi = basic block design EV shifted to the right by i timepoints, = [1] 
and ^ = [1] 

1. For i = 6, . . . , 12: Xi = X^.g, cx, = [1] and ^ = [-1] 



The results are shown in figure 19 and [20 



6.6 Example 6 

It is well known that there is no single haemodynamic response function (HRF) that captures the 
impulse response properties of all voxels in the brain. Here we illustrate the application of the 
technique developed in this paper to enable the simultaneous capture of a set of plausible HRF 
shapes. Plausible HRF shapes can be generated using any reasonable parameterization of HRF. 
Here we generate HRF shapes using the 5 parameter half-cosine parameterization as used in jl5j . 



The details of this parameterization of HRF are given in the Appendix 11.9 200 plausible HRF 



shapes were generated by sampling the 5 parameters from a uniform distribution as follows: 



hi = U{ls,3s) 
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Figure 17: Example 4: (a) Initial design matrix (DM). The 3rd column was initialized using the 
3rd col of Z found in Example 1. The first two columns were left unconstrained in this case but 
the contrast was fixed at [1;0;0]. The set of potential DM's from Example 1 and 2 was augmented 
by adding more DM's at SNR's of [1; -0.5], [-1;0.5], [-1; -0.5]. A "null" DM was also added at 
SNR [0; 1] and [0; — 1] indicating that "pure drift" should be matched to size "0". (b) Estimated 
optimal DM. Notice how the unconstrained columns 1 and 2 converge to non-intuitive shapes (c) 
Performance curves showing the fractional contrast bias Cb, contrast variance change w.r.t Gauss- 
Markov estimate CVa and model variance bias Vh (d) In this example Wi = 1 and (pi = 0.01 
indicating a higher weight to the bias term during optimization. 
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(a) 



(b) 



Design matrix X{75}, 1 000 simulations at SNR = -1 , P. = -1 




1000 simulations 



Design matrix X{75|, 1000 simulations at SNR = -1, P, - -1 



Example 3: 
-SNR^^I ,-1^0 



(d) 



Figure 18: Example 4: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
8 in Algorithm 1 was set to = 2 (c) , (d) For each design matrix (DM) entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM X75 at 
SNR ^ and the GLM fit using the optimal DM. It also shows the distribution of c^7 over 1000 
simulations. Figure (d) is a summary errorbar plot showing E{c^^) over 1000 simulations for data 
generated from each DM. The error bars represent unit standard deviation of c^T (not standard 
deviation of E{c^^) ) to quantify the variance in estimation via simulation. 
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frac contrast bias 



D.D8| — 
D.D6 - 
' 0.04 - 
0.02 - 




DM no -> 

frac contrast variance change w.r.t Gauss-Markov estimate 



DM no -> 
model variance bias 







(d) 



Figure 19: Example 5: (a) Initial design matrix (DM) along with initialization of columns 2 to 4 
of Zq using X2 to X4 respectively. The first column in Z was fixed to the basic block design EV, 
columns 2 to 4 in Z were left unconstrained and the contrast was fixed at [1;0;0;0] (b) Estimated 
optimal DM showing optimal columns 2 to 4 (c) Performance curves showing the fractional contrast 
bias Cb, contrast variance change w.r.t Gauss-Markov estimate CVa and model variance bias Vb 
(d) In this example Wi = 1 and (pi = 0.01 indicating a predominant weight to bias term during 
optimization. 
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(a) 



(b) 



Design matrix X{6}, 1000 simulalions at SNR - 1, B =1 




1000 simulations 



Example 4: 
-SNR=^1,-1 



(d) 



Figure 20: Example 5: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
8 in Algorithm 1 was set to = 2 (c) , (d) For each design matrix (DM) entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM Xq at 
SNR ^ and the GLM fit using the optimal DM. We also show a GLM fit at a higher SNR of 4 for 
illustration purposes. It also shows the distribution of c^j over 1000 simulations. Figure (d) is a 
summary errorbar plot showing E{c^^) over 1000 simulations for data generated from each DM. 
The error bars represent unit standard deviation of c^7 (not standard deviation of E{c^"f) ) to 
quantify the variance in estimation via simulation. 
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h2 = U{3s, 7s) 
/i3 = U{3s, 7s) 
/i4 = U{3s,9s) 
f = U{0,0.5) 



(56) 



The units of /ii, . . . , /i4 are in sec (s) and / is dimensionless. U{a, b) denotes the uniform distribution 
on [a, b]. Data was generated by samphng at every 0.1 s. 




Figure 21: 200 samples of HRF drawn from a 5 parameter half-cosine parameterization of Haemo- 
dynamic Response Function (HRF) that were used as input to the optimization process in Example 
6. 

The automatic initialization strategy described before was used for initialization. The weights Wi 
were set to 1 and (pi were set to their default values of 0.5. Optimization results are shown in figure 
22 and [23] Once the set of "optimal" HRF capturing functions are found, they can be entered into 
any GLM analysis as follows: 

1. Convolve the experimental EV with each of the "optimal" HRF capturing functions. 

2. Next enter the resulting DM into a GLM analysis and use the contrast cz from the optimiza- 
tion above to capture the "size" of underlying signal optimally. 
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col1:init 

— col2:init 

col3:init 

col4:init 

col5:init 

col1:opt 

col2:opt 

col3:opt 

col4:opt 

col5:opt 



5 6 7 

^ Number of columns in Z 



150 200 
Time (s) -> 



(b) 



250 300 



frac contrast bias 




Figure 22: Example 6: (a) R versus p curve for determining the optimal number of columns in Z. 
Using a cutoff of -Rc = 0.95 the optimal number of columns in Z was determined to be popt = 5. (b) 
The 5 columns of Z were left unconstrained and the contrast was fixed at [1; 0; 0; 0]. The automatic 
initialization strategy described before was used to initialize the 5 columns of Zq (dotted lines) . The 
optimized columns are shown in the same figure using solid lines, (c) Performance curves showing 
the fractional contrast bias C^, contrast variance change w.r.t Gauss-Markov estimate CVa and 
model variance bias Vb (d) In this example = 1 and (f)i = 0.5 indicating an equal weight to bias 
and variance terms during optimization. 
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iter number -> iter number -> 



(a) (b) 



Design matrix X{105|, 1000 simulations at SNR - 1, p - 1 1000 simuiations 




450 



Figure 23: Example 6: (a) Figure showing the evolution of objective function values ^^(Z, cz) over 
algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) Figure 
showing the variation in the step size a over algorithm iterations. Step size controlling parameter 
8 in Algorithm 1 was set to = 2 (c) , (d) For each design matrix (DM) entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows an example of simulated data for DM Xiq^ at 
SNR ^ and the GLM fit using the optimal DM. It also shows the distribution of c^7 over 1000 
simulations. Figure (d) is a summary errorbar plot showing £"(0^7) over 1000 simulations for data 
generated from each DM. The error bars represent unit standard deviation of c^T (not standard 
deviation of E{c^^) ) to quantify the variance in estimation via simulation. 
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7 Application to real fMRI data 

In this section, we describe a case study in detail and show how the approach can be appHed in 
practice. 

7.1 Data acquisition 

This case study describes a buprenorphine infusion scan at 0.2 mg/70kg dosage. Ehmination half- 
hves for intravenous administration of buprenorphine (0.3 mg) range between 1.2 to 7.2 hours 
(mean = 2.2 hours), while the terminal half-life is 3 hours (Bullingham, et al. 1980). Onset of 
buprenorphine (intravenous, 0.3 mg to 0.6 mg dose) is within five minutes and analgesic effects 
lasts for 6 to 8 hours (Downing, et al. 1977). 

In the 25 minute infusion scan, a 5 minute baseline was collected prior to the first infusion of 
buprenorphine or placebo. Four infusions, totaling 8 ml, were performed at minutes 5, 7, 9 and 
11. Each 2 ml infusion was performed at a rate of 0.1 ml/sec and controlled by an automatic 
microinjector (Medrad Spectris, Colombus, OH). 

All data were collected on a 3 Tesla Siemens Trio scanner with an 8-channel phased array head 
coil (Erlangen, Germany). Infusion data were collected using a gradient echo-echo planar pulse 
sequence (GE-EPI) at a 3.5 x 3.5 x 3.5 mm3 resolution. GE-EPI Parameters: Time of Repetition 
(TR) = 2500 msecs. Time of Echo (TE) = 30 msecs. Field of View (FOV) = 224x224, Flip Angle 
(FA) = 90, num of Slices = 41 axial slices and num of Volumes = 600. The acquisition time for 
the infusion scan was 25 mins and 5 sees. Tl-weighted structural images were acquired using a 3-D 
magnetization-prepared rapid gradient echo (MPRAGE) sequence at a resolution of 1.33 x 1.0 x 
1.0 mm3. MPRAGE Parameters: TR = 2100 msecs, TE = 2.74 msecs. Time of Inversion (TI) = 
1100 msecs, FA = 12, num of Slices = 128 sagittal slices (Mugler and Brookeman 1990). 

7.2 Data Analysis 
7.2.1 Preprocessing 



Single subject data analysis was performed using FMRIB Software Library (FSL) (http: //www.J 
[fnirib.ax.ac.uk/fsl). The first two volumes were removed to account for MR signal instability 
in the initially acquired volumes. Raw fMRI data was preprocessed using the following steps 1) 
Skull stripping using a Brain Extraction Tool, 2) Motion correction using FMRIBs Linear Motion 
Correction tool (MCFLIRT), 3) Spatial smoothing with a 5 mm FWHM spatial filter. 



7.2 Data Analysis 
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7.2.2 Model Free analysis 

Following this preprocessing, a projection pursuit analysis using the algorithm ADIS (Automated 
Decomposition Into Sources) was performed to extract spatial maps of minimum entropy to identify 
patterns of activity in the brain. ADIS (http://arxiv.org/abs/0902.4879, arXiv:0902.4879vl 
[stat.CO]) is a probabilistic and constrained projection pursuit software that outperforms conven- 
tional ICA algorithms in several benchmark tests. Data was analyzed as follows: 

1. Dimension reduction via probabilistic PCA. A lower bound for the latent dimension was 
determined via a bootstrap approach. A cross-validation analysis was used to estimate the 
true latent dimension. 

2. The PCA-reduced data was decomposed into a set of minimum entropy spatial maps and their 
associated timecourses using "negentropy" based projection pursuit. The ADIS optimization 
core was used to perform constrained optimization. The resulting z-maps were thresholded at 
z > 3 and z < —3. 

The purpose of running ADIS was to demonstrate the existence of multiple infusion response profiles 
even for single subject data. 
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Figure 24: Some infusion response profiles extracted by ADIS illustrating the variability of the 
signal of interest. The black arrow indicates the approximate timepoint at which the signal starts 
going up from baseline. Note that the individual infusion responses are potentially corrupted by 
the linear drift which was identified as a global component in the brain. 



7.2 Data Analysis 
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ADIS produced a total of 16 components. Figure 24 shows associated timecourses for some min- 
imum entropy spatial maps. The linear drift was found to be present globally throughout the 
brain superimposed on the various response profiles. Thus an example of a potential design matrix 
would be the one such as shown in Figure 27 with the first column a ramped step change modeling 
the infusion response and the second column a covariate of no interest as the linear drift. The 
contrast of interest to the investigator will be [1;0]. Clearly the data contains multiple response 
profiles differing in mainly their rise from baseline. There could be many more variations possible 
in general datasets. In the study of pharmacological fMRI responses, the exact response of the 
brain is not easily modeled because of pharmacokinetics and pharmacodynamics differences be- 
tween the circulating system and the brain. Henc, it is feasible to expect different responses across 
brain structures. It seems appropriate to use a "ramp" model with potential delays to accomodate 
responses in different brain structures. 



7.2.3 Optimal DM computation 

Subsequent to this data "inspection" stage we carried out an analysis to compute the optimal 
design matrix. Two main EVs were proposed for this analysis as shown in 27 The EVl captures 
an "infusion response" while EV2 represents pure "linear drift". For the purposes of optimization 
we proposed 723 potential DMs with the intention of capturing delays upto 180 timepoints. Let 
Xq be the DM shown in |27} Define 

(57) 
(58) 









, 1) shifted to the right by i timepoints 
Xi{:,2) = Xo{:,2) for ah i 


These 723 DMs are as follows 






1. 


The first 180 DMs were 


Xi, 


. . , Xi8o at 1 = [1; 0.5] and cx, = [1; 0] 


2. 


The next 180 DMs were 


Xi, 


. . . , Xi8o at 1 = [-1; 0.5] and cx, = [1; 0] 


3. 


The next 180 DMs were 


Xi, 


. . . , Xi8o at 1 = [1; -0.5] and cx, = [1; 0] 


4. 


The next 180 DMs were 


Xi, 


. . . , Xi8o at 1 = [-1; -0.5] and cx, = [1; 0] 


5. 


DM 721 was Xq at ^ = 


[0;1] and cx, = [1;0] 


6. 


DM 722 was at ^ = 


[0;- 


-1] and cx, = [1;0] 


7. 


DM 723 was Xq at ^ = 


[0;0] and cx, = [1;0] 



(1) to (4) above instruct the optimization process to be sensitive to both "activation" and "deactiva- 
tion" with both "positive" and "negative" linear drift. We also explicitly instruct the optimization 
process to match pure "positive" or "negative" linear drift to an "infusion response" of size using 
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(5) and (6) above. (7) instructs the optimization to match "infusion response" to size in the 
absence of any signal (either infusion response or hnear drift). We choose a conservative SNR of 1 
for the signal of interest and a linear drift amplitude 50% that of the main signal. The computed 
optimal DM and its performance are shown in Figures 25]|26 



For the optimization, we fixed the first two columns of Z to the EVs shown in 27 
was also fixed to be [1; . . . ; 0] corresponding to the main infusion response. 



The contrast 



7.2.4 GLM analysis 

Subsequent to the computation of optimal DM, two GLM analyses were carried out using FSL's 
tool FEAT: (1) using an optimized design matrix (2) using a design matrix containing 3 EVs. The 
2 EVs shown in [27] and an additional EV representing the temporal derivative of the "infusion 
EV". 



8 Results 



Figures [2]- [7]illustr ate the results of validation tests for Algorithm 1 It was found that the maximum 
difference in objective function using Algorithm 1 and the more sophisticated optimization solver 
was of the order of 10~^ (see Table 



8.1 Case studies 1-6 

In Examples 1-4, the size of the optimal DM was chosen to be p = 3. The weights Wi were chosen 
to be 1 for all i to reflect equal likelihood of potential DMs in all Examples. 



8.1.1 Example 1 



Performance curves for Example 1 are shown in Figures 11 - 12 In this example (pi was set to its 
default value of 0.5. Columns 1 and 2 of the Z were fixed during optimization and the unconstrained 
column in Z was initialized by drawing from a uniform distribution C/(0, 1). 

It is seen that the contrast bias \Cf,i\ is maintained at < 0.05 for i = 1, . . . , 36. For i = 37, . . . , 50, 
\Cbi\ increases to around 0.12 for i = 50. At the same time the contrast variance change w.r.t. 
Gauss-Markov estimate CV/^i decreases monotonically from 0.096 for i = 1 to 0.002 for i = 20 and 
becomes negative from i = 21, ... ,50 implying a lower variance than the Gauss-Markov estimator. 
The model variance bias Hi is maintained at < 2 x 10~^ for all i. 



8.1 Case studies 1-6 
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Figure 25: fMRI case study: (a) R versus p curve for determining the optimal number of columns 
in Z. Using a cutoff of Rc = 0.95 the optimal number of columns in Z was determined to be 
Popt = 6. (b) The first 2 columns of Z were constrained and the others were left unconstrained 
during optimization. The contrast was fixed at [1; 0; 0; 0; 0; 0]. The automatic initialization strategy 
described before was used to initialize the 6 columns of Zq (dotted lines). The optimized columns 
are shown in the same figure using solid lines, (c) Performance curves showing the fractional 
contrast bias Cb, contrast variance change w.r.t Gauss-Markov estimate CVa and model variance 
bias Vft (d) In this example Wi = 1 and 4>i was chosen using the automatic initialization strategy 
described before. 



8.1 Case studies 1-6 
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Figure 26: fMRI case study: (a) Figure showing the evolution of objective function values Gff,{Z, cz) 
over algorithm iterations. Notice how the function value stabilizes as convergence is reached (b) 
Figure showing the variation in the step size a over algorithm iterations. Step size controlling pa- 
rameter 9 in Algorithm 1 was set lo 9 = 2. For each design matrix (DM) entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DM. Figure (c) shows the ROC curve for data generated from the 
design matrix Xigo at various SNR values. Figure (d) is a summary errorbar plot showing E{c^^) 
over 1000 simulations for data generated from each DM. The error bars represent unit standard 
deviation of c^7 (not standard deviation of E{c^^) ) to quantify the variance in estimation via 
simulation. 



8.1 Case studies 1-6 



45 



8 RESULTS 




Figure 27: Example infusion design matrix. The top figure shows a typical "infusion response" and 
the bottom figure shows a confounding "linear drift". 



8.1 Case studies 1-6 
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8.1.2 Example 2 



Performance curves for Example 2 are shown in Figures 13 - 14 In this example, was chosen 



automatically as described in section 5.3 Columns 1 and 2 of the Z were fixed during optimization 
and the unconstrained column in Z was initialized at the optimal solution from Example 1. 



It is seen from Figure 13(d) that lower weights (pi were assigned for increasing i (approximately 
20% lower for i = 50 compared to i = 1). With this choice of (pi it was seen that the contrast 
bias \Cbi\ was reduced to < 0.01 for all i < 36. For i = 37 . . . ,50, \Cbi\ increases to a maximum 
of 0.03 for i = 50. The value of CV^i decreases monotonically from 0.345 for i = 1 to 0.07 for 
i = 50 implying an increased variance compared to Example 1. The model variance bias Vf,i was 
maintained at < 4 x 10~^ for all i. 



8.1.3 Example 3 



Performance curves for Example 3 are shown in Figures 15-16 In this example, (pi was chosen to 
be 0.1 for all i. Columns 1 and 2 of the Z were fixed during optimization and the unconstrained 
column in Z was initialized by drawing from a uniform distribution U(0,1). 

It was found that \Cbi\ was maintained at < 0.02 for i = 1, . . . , 41. For i = 42, . . . , 50, \Cbi\ increases 
to 0.055 for i = 50. It was also found that CVaj decreases monotonically from 0.195 for i = 1 to 
0.003 for i = 38 and becomes negative from i = 39, ... ,50 indicating a lower variance than the 
Gauss-Markov estimator. The model variance bias Vbi was maintained < 1 x 10~^ for all i. 



8.1.4 Example 4 



Performance curves for Example 4 are shown in Figures 17 - 18 In this example, (pi was chosen 
to be 0.01. Optimization was initialized using solution found in Example 1. The sample space of 
Z was left unconstrained but the set of potential DMs was augmented to explicitly instruct the 
optimization to enable detection of signals in the presence of confounds, as well as treat "drift" 
signals as "null" data. 

It was found that \Cbi\ was reduced to < 0.02 for all i and CV^i decreased monotonically from 
around 0.32 to around 0.05 for each of the four segments i = 1,...,50, i = 51,..., 100, i = 
101, . . . , 150 and i = 151, . . . , 200. The model variance bias Vu was maintained < 2 x 10~^ for all 
i. 



8.2 MRI case study 
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8.1.5 Example 5 



Performance curves for Example 5 are shown in Figures 19 - 20, This example illustrated a block 
design experiment where the user wants to primarly control bias, (pi was chosen to be 0.01 in this 
example indicating a preferential reduction of bias. The first column of the optimal DM was fixed 
to the primary block EV. The optimization process was initialized using shifted versions of the 
primary EV. 

It was found that the optimal DM reduced \Cbi\ to < 0.0042 for i while CV^i was around 2.38 for 
i = 1, 3, 4, 6, 7, 9, 10 and 2.61 for z = 2, 5, 8, 11 indicating an increased variance relative to the Gauss- 
Markov estimator. The model variance bias Vu was maintained < 2.4 x 10-^ for i = 1, 3, 4, 6, 7, 8, 10 
and < 6.7 x lO^^ for i = 2, 5, 8, 11. 



8.1.6 Example 6 



Peformance curves for Example 6 are shown in Figures 22 - 23 This example illustrated the 
construction of a set of optimal HRF capturing functions that enable capture of locally variable 
HRF functions in the brain. The optimization process was explicitly indicated to match "null" data 
with signal size of " 0" optimally using additional DMs from i = 201, ... ,400 at ^ = 0. (p j was set 



to its default value of 0.5. The size of optimal DM was found to be p = 5 using Algorithm 2 The 



5 columns of optimal DM were left unconstrained during the optimization process and initialized 



using the procedure described in section 5.1 



It was found that the mean absolute value of contrast bias was maintained at \Cu\ < 0.075 and the 
mean value of CVai was maintained at < 0.107 for all HRF shapes i = 1, . . . ,200. For the "null" 
data \Cbi\ = and the mean CVai was maintained at < 0.106 for i = 201, . . . ,400. The model 

bi 



variance bias 14,- was reduced to < 8 x 10 for all i 



8.2 fMRI case study 



This case study deals with the optimal capture of signals for an fMRI infusion study. An initial 
model free exploration of data revealed the presence of multiple infusion profiles differing in their 
time to take off from baseline as illustrated in Figure 24 For the DM optimization, (pi was initialized 



automatically using the strategy describe in section 5.3 The first two columns of Z were fixed to the 



EVs described in Figure 27 The optimal number of columns in Z were estimated using Algorithm 
[2] to be p = 6. DM optimization was initialized automatically using the procedure described in 
The set of potential DMs were augmented with additional DMs (total 723 potential 



5.1 



section 

DMs) to guarantee detection of "positive" and "negative" activation in the presence of confounding 
drift as well as to match both the "drift" and "null" data to a signal size "0". 
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Performance curves for the optimal DM are shown in Figures 25 - 26 It was found that the mean 
absolute value of \Cbi\ was reduced to 0.0057 and the mean value of CV^i was 5.70 for i = 1, . . . , 720 
representing the "non-null data" . For the "null" data from i = 721, . . . , 723, we found \Chi\ = and 



the mean CVaj was 7.28. The model variance bias Vu was reduced to < 1.7 x 10 ^ for all i. Figure 



26(c) shows the ROC curve for the detection of signal generated from Xi^q versus the "null" data 
generated from a "pure" drift signal at SNR [0;0.5] over 1000 simulations from each. We chose 
Xiso because it is an extreme DM that produces the highest absolute contrast bias \Chi\ ~ 0.063. 



A simple decision rule based on the T statistic calculated from equation 13 was used for positive 
activation detection to generate the ROC curve. For a cutoff T-statistic tc- 

Decide activation if: T(7, ffi; Z; cz) > tc (59) 
Decide null if: T{^, di; Z; cz) < tc (60) 

where 7 and di are the parameter vector and residual standard deviation respectively estimated 
using the optimal DM Z and contrast cz- It is found that for SNR of ^ = [2; 0.5], a sensitivity 
of 93.5% was obtained at a specificity of 94.2% and for SNR of ^ = [3; 0.5], a sensitivity of 98.8% 
was obtained at a specificity of 98.7%. The real fMRI data was found to have approximately an 
SNR > 4 for the primary infusion response indicating a high sensitivity and specificity of signal 
detection using the optimal DM Z. 

Figure [29] shows the t-stat image corresponding to the "infusion response" overlaid onto the anatom- 
ical image (transformed into the "fmri" space) of the subject. For illustration purposes, we extracted 



raw timecourses from 6 sample points in the brain the details of which are shown in Table [8^ Fig- 
ure 



30 -|32]show the raw timecourses from real fMRI data with full and partial model fits obtained 
using the optimal DM Z. For illustration purposes we also show the full and partial model fits 
obtained using a "naive" DM such as that using the temporal derivative of the "infusion response" 
as a covariate. Sample point 1 represents a canonical 0-delay infusion response, while sample point 
2 represents a 200 timepoint delayed infusion response. Sample points 3 and 4 represent infusion 
responses with delays of around 100 timepoints while sample points 5 and 6 represent infusion 
responses with delays of around 150 timepoints. It was found that using the optimal DM Z it was 
possible to detect all infusion responses (delayed or not) in a robust and unbiased fashion. 



9 Discussion 



The objectives of this paper were the development of a theoretical framework and a numerical 
algorithm to enable optimization of design matrices used in fMRI analyses. This optimization 
framework (SMART) allows a user to optimize an objective function capturing the bias-variance 
decomposition over a set of potential design matrices. Within this very general framework it 
is possible to specify weights measuring the expected frequency of occurrence of various design 
matrices as well as preferentially control for bias or variance, if desired. The sample space for 
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Figure 28: Figure shows the variation of False positive rate (1 - Specificity) and True positive rate 
(Sensitivity) with the T statistic threshold for data generated as follows: (1) "Activation" generated 
at various SNRs from an extreme DM - Xi^o, that produces the highest absolute contrast bias 
\Cb\ ~ 0.063 of all 723 DMs used in optimizaton (2) "Null" data generated from a "pure drift" 
signal at SNR [0; 0.5] which is a covariate of no interest and hence should be matched to a signal 
size of "0". 



Sample 


Coords 


t-stat Opt DM 


t-stat Der DM 


1 


[42,41,29] 


4.84 


8.56 


2 


[40,20,25] 


4.06 


-4.39 


3 


[42,28,24] 


5.89 


-0.86 


4 


[36,16,24] 


4.45 


-2.38 


5 


[18,26,24] 


3.27 


-1.67 


6 


[36,11,22] 


4.35 


-1.17 



Table 2: Table shows 6 sample voxels, their co-ordinates and the GLM t-stat values for the contrast 
representing the "size" of infusion EV using the optimal design matrix (Opt DM) and the design 
matrix with the temporal derivative (Der DM). 
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Figure 29: fMRI data was analyzed using GLM with the optimal DM. Figure shows t-stat map for 
the contrast of interest [1; 0; 0; 0; 0; 0] representing the "infusion response". The anatomical image 
for the sample subject was transformed to native space for t-map display purposes. The t-map 
was thresholded at an uncorrected threshold of 2.3. From the ROC curve shown in Figure 28, a 
specificity of ~ 99% and a sensitivity of > 98% is obtained at SNRs > 3. Real fMRI data had an 
SNR > 4 and hence a cutoff of 2.3 on the t-statistic map performs well. 
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Figure 30: fMRI data extracted from sample voxels 1 and 2. A GLM analysis was run using the 
optimal DM and a DM containing the 2 EVs shown in [27] and the temporal derivative of the 1st EV 
(i.e., the infusion ev). Figures show the full model fit and partial fit corresponding to the contrast 
of interest. Figures (a) and (c) are for the optimal DM while (b) and (d) are for the "temporal 
derivative" based DM. 
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Figure 31: fMRI data extracted from sample voxels 3 and 4. A GLM analysis was run using the 
optimal DM and a DM containing the 2 EVs shown in [27] and the temporal derivative of the 1st EV 
(i.e., the infusion ev). Figures show the full model fit and partial fit corresponding to the contrast 
of interest. Figures (a) and (c) are for the optimal DM while (b) and (d) are for the "temporal 
derivative" based DM. 
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(d) 



Figure 32: fMRI data extracted from sample voxels 5 and 6. A GLM analysis was run using the 
optimal DM and a DM containing the 2 EVs shown in [27] and the temporal derivative of the 1st EV 
(i.e., the infusion ev). Figures show the full model fit and partial fit corresponding to the contrast 
of interest. Figures (a) and (c) are for the optimal DM while (b) and (d) are for the "temporal 
derivative" based DM. 
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optimization is controlled via constraints on the columns of the optimal design matrix and the 
associated contrast. 

We validated our numerical algorithm by comparing it with a more sophisticated optimization 
solver in two validation tests. We proposed a strategy for choosing the number of columns in 
the optimal design matrix as well as strategies for automatic selection of initial point for the 
optimization algorithm and for choosing local bias-variance weightings <j)i based on user-specified 
objectives. 

We then illustrated the application of the technique by considering 6 case studies. Our aim in 
these examples was to illustrate the degree of control that a user has in terms of controlling the 
optimal solution based on user specifications. The first 4 examples illustrated the variation of 
various control parameters in the algorithmic framework as applied to a phMRI study. Example 

5 illustrated the application of the proposed technique to a block design case study. Example 6 
addressed the important issue of locally variable HRF functions in fMRI data. The goal in example 

6 was to come up with a set of HRF-modeling functions to capture a range of HRF shapes while 
maintaining optimality with respect to bias and variance of the primary contrast capturing the 
response amplitude of the underlying EV. 

We examined how the optimized design matrix derived using SMART compared to an alternative 
in which a temporal derivative is added as an additional EV to capture variation in signal onset. 
Figure [33] compares the bias in parameter estimate at SNR = 1 over 1000 simulations for the 
temporal derivative approach as well as for the four SMART phMRI design matrices. A signifi- 
cant reduction in bias is observed when using the optimized design matrices in comparison to the 
temporal derivative approach, with Examples 2-4 performing best. 

Finally, we applied the technique to a real infusion phMRI dataset. First, an "optimal" design 
matrix was derived using SMART. Next, we examined the bias- variance properties and generated 
ROC curves for the estimated design matrix. Finally, a GLM analysis was performed on the phMRI 
dataset using this design matrix. It was found that this design matrix achieves very high (> 98%) 
sensitivity and specificity of signal detection over a range of signal variations in the data. 

The use of a set of basis functions has been recommended before for capturing variability in fMRI 
signal shapes. For example, as relates to the capture of locally varying HRF shapes the work 
by FMRIB analysis group on FLOBS [TB] is notable in that it constrains the basis set to have 
sensible HRF shapes. These approaches allow one to test the hypothesis about the presence or 
absence of signal by using F-statistics. However, limitations of such approaches include (1) the 
inability to measure the amplitude of the HRF signals and (2) The inability to combine amplitude 
measures from single-subject analyses into a group level analysis. Because the signal amplitude is 
never measured in these basis function approaches it is also not controlled for bias and variance. In 
contrast, the HRF-capturing functions developed in Example 6 illustrate how variable signal shapes 
can be captured using a single contrast while optimizing the bias and variance of the resulting signal 
amplitude estimates as per user defined objectives. 
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In this paper we considered an inverse problem. Instead of proposing a statistical estimator and 
studying its bias/variance properties, we define an objective function that captures the bias- variance 
decomposition over the set of potential signal shapes in the data and then explicitly optimize this 
objective function for both a design matrix and a contrast. This results in an optimized design 
matrix and contrast that automatically captures the amplitude of signals of interest. The resulting 
PE values can be easily carried over for a group level analysis. 



1 000 simulations, at SNR - 1 




DM no- 



Figure 33: For each design matrix (DM) Xi from Examples 1, 2, 3 and 4 entered into optimization, 
1000 simulated data-sets were generated at SNR ^. A GLM analysis was run on each of these 
data-sets using the optimized DMs for Example 1, Example 2, Example 3, Example 4 as well as 
a DM Xd such that the first two columns of X^ are the same as the first two columns in Z 
from Example 1 but the 3rd column is the temporal derivative of X £>(:,!). Figure above shows a 
summary errorbar plot showing E{c^^) over 1000 simulations for data generated from each DM 
and analyzed via GLM using optimized DM's from Example 1, 2, 3, 4 as well as using X/j. The 
errorbars represent the standard deviation of E{c^^) so that bias can be statistically compared 
across the four cases. 
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10 Conclusion 

Wc developed a theoretical framework (SMART) to enable calculation of optimal design matrices 
for fMRI analyses that simultaneously enables detection of multiple signal responses as well as 
controlling for bias and variance in the GLM estimation. This is achieved by optimizing for the 
contrasts and basis functions of the design matrix to minimize the bias- variance decomposition over 
a set of design matrices capturing anticipated variability in the data. Although the technique was 
developed with motivation from fMRI, its development is quite general and appears to be applicable 
to a variety of problems from many other disciplines. 



11 Appendix 

11.1 Derivation of gradient equations 

Consider the matrix A G R"^"*, square matrices B,C e R"^" and vectors x G R" and y,z e R"*. 
Recall that the matrix derivative with respect to matrix A of a scalar quantity / is denoted by ^ . 
The ijth. element of |^ is defined as: 



dA 



ij 9aij 



where Uij is the ijth element of A. 
Below we note some basic identities: 



d_ 
dA 



(x'^Ay) 



d_ 
dA 



{y'^A^x) 



_d_ 
dB 



{x^B-^x 



[-B-^xx^B-^].. 



d_ 
dA 



{y^iA^A)-'z 



d_ 

dB 

' d_ 
dB 



= [-AiA^A)-\zy^ + yz^)iA^A)-% 
trace(5C) = [C^].. 

- ij 

trace(S^C) = [C]^,- 



(61) 



(62) 
(63) 
(64) 
(65) 
(66) 
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trace (BC) = trace (CB) 



(67) 



Recall, equation |33[ the objective function under consideration: 

m 

'^^iWi + tr {PzH^y^H'^) 



cliZ^Z)-^Z^H^BH^Z{Z^Zy^cz - 2cl{Z^Z)-^Z^H^B^ + ^^(^ " 20^)(c£ A/a,)' (68) 



i=l 



The matrix gradient with respect to Z can be written as: 

m 

Y '^<t>iWi + tr {PzH^v^H 



§^G,{Z,cz) = ^{cl{Z^Z)-'cz] 



+ci(Z^Z)-icz 



d_ 
'dZ 



Y '^4>iWi + tr {PzH<^>v^H^) 



,j=i 



^ {4{z^ z)-^ Z^ H<^bH^ z{z^ zy^cz} 



dZ 



d 



+ ^{-2c'z{Z^Z)-^Z^H^B^] 



For the sake of presentation clarity, we define the following smaller terms in equation 

Terml = ^ {cl{Z^ Zy'cz} 



Term 2 



d_ 
dZ 



{[tr {PzH<^v^H^)]} 



Term 3 = ^ {cl{Z^ Z)-^ Z^ H<^bH^ Z{Z^ Zy^cz} 



d 



Term 4 = ^ {-2cl{Z^Zy^Z^H<^>Be} 



d_ 
dZ 



Term 5 = ^ < ^M'^ " 2 </<*)( c^^ A/ 

> i=l 



(69) 



(70) 



11.2 Term 1 
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11.2 Term 1 

From |64] we have: 

11.3 Term 2 



A {cliZ^zr'cz] = -Z{Z^Z)-\2czcl){Z^Z)-' (71) 



^- { [tr {PzH'^v^H^)] } = tr {-^{Z^Zy^Z^H^v^H^ 



dZij \ dZij 



+tr (-Z^ {{Z^Zy^} Z^H-^v^H^^ +tr (^-Z^Z^ Zy^^H'^yJ^H^ 

Again, 



(72) 



tr (^~{Z^Z)-'Z^H^vJ:H^^ = - [HJ:^^^H^Z{Z^Z)-\ (73) 



and 

7T \ / avT 



tr (^-Z{Z^Z)-^^H^v^H^^ = tr (^-"^H^y^H^ Z{Z^ Zy^^ = - [H^yEH^ Z{Z^ Zy^]^^ 

(74) 



Also, 



tr (-^ {(Z^Zy'} Z^H^V^H^) 

tr ( z{z^zy'^z{z^zy'z^H<^y^H^ + z{z^zy'z^^{z^zy^z^H<!>y^HA 

V oZij aZij ) 

tr ( ^z{z^zy'z^H<!>y^H^z{z''zy^ + ^{z^zy^z^H<!>vj:H''z{z''zy'zA 

V c)Zij oZij J 

= [z{z'^zy'^z^H<^yT.H'^z{z'^zy'^ + z{z'^zy^z^HT.^<^>J^H^z{z^zy^].. (75) 



Combining 102 103 74 and 75 we get: 
d 

— { [tr [PzH^yTsH^)] } = -Ht^^lrH'^Z{Z'^Zy^ + Z{Z'^ Zy^ Z"^ H^yT.H'^ Z{Z'^ Zy^ 

+z{z^ zy^ z^ H^^'^lrH^ z{z^ zy^ - H<^v^H'^z{z'^zy^ 
= -{H^yT.H^ + i7s'^$^if'^)z(z^z)-i + z{z'^ zy^ z"^ [H^yT.H'^ + ht,'^^'{^h'^)z{z'^ zy^ 

= -PziH'^yT.H'^ + Hlf^'^H'^)Z[Z'^Zy^{7Q) 
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11.4 Term 3 

d 



{cI{Z^Z)-^Z^H^bH^Z{Z^Z)~'cz] 



= c^TT^ {{Z'' Z)-^] Z'^ H^bH^ Z{Z'^ Z)-^cz 

O/jij 

+ci(Z^Z)-i J- [Z^] H^bH^Z{Z^Z)-^cz 
+c%{Z^Z)-'Z^H^bH^j^ {z} {z^zy'cz 

+cI{Z^Z)-^Z^H^bH^Z^ {{Z^zr^]cz (77) 



Applying 89 and 64 repeatedly to each of the above terms we get: 

A [diZ-^zr^Z^H^BH^ZiZ^Zr^cz] 
= -Z{Z'^ Z)-^{Z^ H<^>bH^ Z{Z'^ Zy^czcl + czcliZ^ Z)-^Z^ H<^>]^H^ Z){Z'^ Z)-^ 

+h^bH^ z{z^ zy^czcliz^ Z)~^ 
+H<^^H^ Z{Z^ zy^czcliz^ Z)~^ 
-Z{Z^ Z)~-^{czcl{Z^ Z)~^ Z^ H<^bH^ Z + Z^ H<^]^H^ Z{Z'^ Z)-^czc^){Z'^ Z)-^ (78) 

Since <I>b is diagonal, this can be simplified to: 

A {cI{Z^Z)-'Z^H<!>bH^Z{Z^Z)-'cz} 

= -2Z[z'^ z)-^{z'^ h^bH'^ z[z'^ zy^czcl + czc^z^z'^ zy^ Z^ H^'^bH'^ z)[z'^ z)-^ 

+2h^bH^ z{z^ z)-^czcl{z^ zy^ 

(79) 



11.5 Term 4 



_d 
dZ, 



{-2cl{Z^zy^Z^H<^Bi} 



= {-2ciJ-{(^^Z)-i}Z^w} 
+ |-2ci(Z^Z)-i^{Z^}w} 



(80) 



11.6 Term 5 



60 



11 APPENDIX 



Application of 89 and 64 gives: 



{-2){-Z){Z^Z)-\z'^H'^Bic^ + czi'^^lH^Z){Z'^Z)~^ 

+{-2)H^Bicl{Z^Z)-^ 



(81) 



11.6 Term 5 



Term 5 is a constant w.r.t Z and cz and so the gradient w.r.t Z and cz is 0. 



11.7 Combining terms 



Combining 98 101 76, 79 and 81 gives: 
d 



dZ 



G^{Z,cz) = [-Z{Z^Z)-\2czc^z)iZ^Z)-'] 



J2 '^<i>iWi + tr [PzH^v^H^) 



1=1 



+cl{Z^Z)~^cz [-Pz{H^v^H^ + H^^<^'^H^)Z{Z^Z)-^] 

[-2Z[Z'^ Z)~^{Z'^ H^bH'^ Z{Z'^ zy^czcl + czc^iz'^z)-^z'^H<^'^H'^z){z^zy'^] 

+ [2H^bH^Z{Z^Z)-^czcI{Z'^Z)-^] 
[{-2){-Z){Z^Z)-\Z^H<^>Bic% + czf^lH^Z){Z^Zy^ + {-2)H^b^cI{Z^ Z)-^] 



(82) 
(83) 



Noting that ^b-, and S are diagonal and given the fact that diagonal matrices commute, 
rearrangement gives: 



d_ 
~dZ 



G4,{Z,cz) = [-Z{Z^ Zr\2czJz){Z'^ Z)-^] 



2(^iWi + tr {PzH^v^H^) 



i=l 



-2 {Jz{Z^Z)-^cz) [Pz{H<^v^H^)Z{Z^Z)-^] 

-2Z[z'^z)-^ [czc^z^z'^zy^z'^H^BH'^z] {z"^ zy^ 
-2Z{z'^zy^ [z'^H^BH'^z^z'^zy^czcl] {z'^zy^ 
+ [2h^bH^ z{z'^ zy^czc^iz^ zy^] 
+2Z{z'^ zy\z'^ H^Bic^)iz'^ zy^ 
+2Z{z^ zy^{czi^<^lH'^ z){z^ zy^ 

-2H<^>Bicl{Z'^Zy^ 



(84) 
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Thus the gradient with respect to cz is easily computed as: 
d 



dcz 



G^{Z,cz) = 2{Z^Z)-^cz 



"^2(^iWi + tr{PzH^V^H^ 



i=l 



+2[Z^ Z)-^ Z'^ bH'^ Z[Z^ Zy^cz - 2(Z'^Z)~^Z^H^B^ 



(85) 



Equations 84 and 85 are the same as 36 and 37 respectively. 



11.8 Exact solver details 



Our optimization algorithm solves the general problem: 

min 3,f{x) 
s.t. Ci[x) = 0, i = 1, 2, . . . , m 
s.t. 5^(3:) > 0, j = l,2,...L 

where x G R^. 

We convert the inequality constraints into equality constraints via slack variables as follows: 

gj{x) - Sj = 

sj>0, j = l,2,...L 



(86) 
(87) 



(89) 
(90) 



Thus the optimization problem becomes: 



min f{x) 
s.t. Ci{x) = 0, 



s.t. gj{x) 



0, 3 



1,2, ... ,m 
= 1,2, ...L 



>0 



(91) 
(92) 
(93) 
(94) 



This problem is now an equality constrained problem where the inequalities have been replaced 
by the bound constraints on the slack variables. Thus it suffices to consider equality constrained 
problems with bounds on independent variables as follows: 



min f{x) 
s.t. Ci[x) =0, i 

S.t. Jij X j *^ tiij « % 



1,2, ... ,m 
= 1,2, ...n 



(95) 
(96) 
(97) 



where x £ R^. 
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Our code uses a trust region based augmented lagrangian approach to solve these bound constrained 
problems following closely the LANCELOT software package [5], [3]. The augmented lagrangian 
function for the above problem is defined as: 

m m 

C{x, A, fi) = fix) - AiQ(x) + I J] c,(x)2 (98) 

1=1 i=l 

At each outer iteration k, given current values of A'^ and we solve the subproblem: 



min £(x, A^,^fc) (99) 
s.t. li<Xi< Ui (100) 



If P is the projection operator defined as 

[P{z,Uu)l-- 



''I 

Zi 
Ui 



if Zi < k 
if h ^ Zi < Ui 
if Zi > Ui 



then the Karush-Kuhn- Tucker (KKT) optimality condition for 99 is given as [3]: 



X — P{x — V xC{x, A ' , iJ,k),l, u) = 



(101) 



(102) 



The outer iteration code is given in Framework 1. Note that the penalty parameter fik is updated 
based on a feasibility monitoring strategy that allows for a decrease in if sufficient accuracy is 
not achieved in solving the subproblem |99[ 

At each inner iteration we form a quadratic approximation to the augmented lagrangian and ap- 
proximately solve the inequality constrained quadratic sub-problem: 



min p -p^Vl^£{x, A, fj.)p + Vx-£(x, A, n)'^p (103) 

s.t. k < Xi < Ui (104) 
s.t. IIpIIoo < A (105) 

The inner iteration code uses non-linear gradient projection [2j followed by Newton-CG-Steihaug 
conjugate gradient iterations [I4j. Quasi-Newton updates are performed using either SRI [4, (rec- 
ommended for non-convex functions) or BFGS [Ij (recommended for convex functions). For very 
large problems, we switch to the limited memory variants |Tl] of these quasi-Newton approxima- 
tions. The algorithm details are given in Framework 2. The trust region update code is based on 
a standard progress monitoring strategy [ :12| and is given in Framework 3. 



11.8 Exiici i^olvcr dclnils 



(33 



11 APPENDIX 



Algorithm 4 Fl: Outer Iteration 

Require: Initial point Xinit, A°, /Uo, 6^ G (l,oo), 6i G (0,1) 

1: Choose tolerances r/*^ and ri*^ad- '^^^ default is ry*^^ = 11*^.^^ = le— 6. . 

2: H = llo, Vcon = '^/l^'^, Vgrad = V/^O 

3: for A; = 0,1,2,... do 

4: found = 

5: while found / 1 do 

6: Try to find such that 

\\xk - P{xk - VxC{xk, A'', /Ufe), u)||oo < Vgrad via F2 using starting point as Xk-i- 

7: if above step is completed successfully then 

8: Set found = 1 

9: else 

10: A'^+l = Afe 

11: = (^lll'k 

12: Vcon = '^/iJ^k^ 

13: Vgrad = V/^fe 

14: end if 

15: end while 

16: if \\c{xk)\\oo < Vcon then 

17: if ||c(xfe)||oo < V*con and 

\\Xk - P{Xk - Va;>C(xfc,A'=,0),Z,«)||oo < V*grad 

18: Stop and return current solution Xk- 

19: end if 

20: A'=+^ = Afc - llkc{xk) 

21: /Xfe+i = ptfc 

22: Vcon — Vcon/ l^k+l 

23: Vgrad ~ Vgrad/ 

24: else 

25: A'^+l = Afc 

26: = ^h^fc 

27: Vcon = '^-/l^k^ 

28: 77gTOd = V/^fc 

29: end if 

30: end for 
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Algorithm 5 F2: Inner Iteration 

Require: jmax, rjgrad, A, I, u, A*^, /ik, V e (0, 1), flag 

1: found = 

2: X = Xfc_i, J = 1 

3: Compute, y = Vx^{x, A'^, //fe) 

4: Estimate B = V'^^C{x, X'^, fik) using BFGS, SRI or limited memory BFGS, limited memory 

SRI quasi Newton Updates. 
5: while found ^ 1 and j < jmax do 
6: Calculate the Cauchy point pc for problem: 

\ rp rj-i 

min p -p Bp + g p (106) 

s.t. I - X <p <u- X (107) 

s.t. IHloo < A (108) 

using non-linear gradient projection and calculate the current active set A. Let be the 
unit vector with 1 at position i and zeros elsewhere. If zi,Z2,...% ^ A then let Q = 

^il 1 ) • • • ) Cj^] . 

7: g = Q'^{g + Bp^) and B = CfBQ 

8: Compute the approximate solution v to the problem 

min ^ ]^v^Bv + g^v (109) 

s.t. I — X < Pc + Qv < u — X (110) 
s.t. IIpc + Q^^IIoo < A (111) 

using truncated conjugate gradient iteration (Newton-CG, Steihaug). If flag = 1 use pre- 
conditioned Newton-CG using the inexact-modified Cholesky factorization. 

9: Compute p = Pc + Qv 

10: Calculate 6c = C{x) — C{x + p), 6m = 0.5p^Bp + g^p and p = 6c/ 6m 
11: if p > f] then 

12: X = X + p 

13: end if 

14: Compute new trust region radius A using Framework F3. 

15: Compute, g = VxC,{x, X^, Pk) H p > rj holds otherwise use the previous value. 

16: Estimate B = V^^£(x, X^,pk) using BFGS, SRI or limited memory BFGS, limited memory 

SRI quasi Newton Updates. Do the update even if p < 77. 

17: if ||x - P{x - VxJO-{x,X'',pk)J,u)\\oo < Vgrad then 
18: found = 1 

19: end if 

20: i = i + 1 

21: end while 
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Algorithm 6 F3:Trust Region Update 
Require: p, p, A 

1: if p > 0.75 then 

2: if I IpI loo < 0.8 A then 

3: A = A 

4: else 

5: A = 2A 

6: end if 
7: end if 

8: if 0.1 < p < 0.75 then 

9: A = A 

10: else 

11: A = 0.5A 
12: end if 
13: return A 



11.9 Half-cosine pcirameterization of HRF 



Here we describe the equations used to generate plausible HRF shapes via a 5-parameter half-cosine 
parameterization. 



This parameterization is defined as follows: 





HRF{t) 



cos 



COS 



2hi 



+ 



h3 



{hi +h2-t) 



/cos TT - ^{t -hi-h2- hs) 







if 0<t<hi 

if hi <t <hi + h2 

if hi + h2 <t < hi + h2 + h3 (112) 

if hi + h2 + hs < t < hi + h2 + hs + h4 
if t > hi + h2 + h^ + hi 
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Figure 34: 5 parameter half-cosine parameterization of Haemo dynamic Response Function (HRF). 
hi controls the time to first rise, /i2 controls the time to peak, controls the time to undershoot 
maximum and /14 controls the time to return to baseline. The amplitude of undershoot is controlled 
by the parameter / while the height of rise from baseline is fixed at 1. See 112 for the exact equation. 
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